Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity?

نویسندگان

  • Mike Plumpe
  • Scott Meredith
چکیده

This paper focuses on experimental evaluations designed to determine the relative quality of the components of the Whistler TTS engine. Eight different systems were compared pairwise to determine a rank ordering as well as a measure of the quality difference between the systems. The most interesting aspect of the results is that the simple unit duration scheme used in Whistler was found to be very good, both when it was used in combination with natural acoustics and pitch as well as when it was taken in combination with synthetic pitch. The synthetic pitch was found to be the aspect of the system that results in greatest quality degradation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving speech synthesis of CHATR using a perceptual discontinuity function and constraints of prosodic modification

Concatenative synthesis is widely used in TTS to generate synthetic speech with high quality and relatively natural-sounding prosody. Whatever the type of synthesis unit used, (diphone, phoneme, etc.), a large speech database is usually needed to ensure the phonetic and phonemic variation of the units in a rich variety of contexts. In the CHATR synthesis system, unit selection nds the most appr...

متن کامل

The Function of Pitch Range Variations in Samples of Emotional Expressions in Persian

This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...

متن کامل

MIMIC : a voice-adaptive phonetic-tree speech synthesiser

This paper presents Mimic : a decision-tree based concatenative voice adaptive text to speech synthesiser. Mimic integrates text to speech synthesis (TTS) with speech recognition and speaker adaptation. Speech is synthesised from concatenation of triphone synthesis units. The triphone units are obtained from clusters of training examples modelled, labelled and segmented using clustered HMMs and...

متن کامل

Small footprint concatenative text-to-speech synthesis system using complex spectral envelope modeling

In this paper we present a method for speech modeling and its utilization in IBM’s small footprint concatenative text-tospeech system. The method is based on frequency-domain, complex spectral envelope modeling, where the phase component plays a crucial role in attaining high quality speech synthesis. The modeling scheme presented enables low bit rate compression of the amplitude and phase info...

متن کامل

Text-to-Speech Synthesis using Phoneme Concatenation

We proposed Text-To-Speech (TTS) synthesis system based on phonetic concatenation for unrestricted input text. The input text is first converted into phonetic transcription using Letter-to-Sound rules. For synthesis of a new speech, TTS system selects the recorded phoneme units (PUs) from database and modifies the duration according to the rule based on spelling using Time Domain Pitch Synchron...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998